On the Coverage of a Morphological Analyser based on "Svensk Ordbok" [A Dictionary of Swedish]
نویسنده
چکیده
In the p ro ject a Lexicon-oriented Parser fo r Swedish a stem d ic tionary (Sågvall H ein & S jögreen 1991; S jögreen , forthcom .) covering the 58 ,536 en try lem m as o f Svensk O rdbok (1986) a long w ith a com plete in flectional g ram m ar o f S w edish (Sågvall H ein , fo rthcom .) w as genera ted . T h is language descrip tion to g eth er w ith the U ppsala Chart Processor, U C P (Sågvall H ein 1987) constitu te a m orpho log ical ana lyzer o f S w edish , hencefo rth referred to as SM U, sho rt fo r S w edish M orphology in the U C P fram ew ork. So far, there are n o w ord fo rm ation rules in the SM U gram m ar, and w ords ou tside the scope o f Svensk O rdbok d o n ’t get an analysis^. E ven though closed in its p resen t version , the coverage o f SM U is w ell-defined; p rio r to any p rocessing w e m ay consu lt Svensk O rdbok to find ou t fo r any w ord form w hether it will get an analysis o rn o t; the d ictionary p rov ides an in tu itive , fam iliar fo rm at through w hich w e m ay explore the (present) com petence o f the S M U an a ly ser w ithou t an y p rio r know ledge o f its fo rm alism s o r operation . SM U is also w ell-deH ned in the sense , tha t fo r an y o f its lem m as. Svensk O rdbok p rov ides links to the correspond ing lexem es (basic senses), and fo r each lexem e a definition. In o u r ongoing w ork on a m ach ine-trac tab le d ictionary fo r Sw edish , w e are app roach ing p rob lem s concern ing the d istinction betw een general and dom ain specific vocabu la ry , and the p resen t coverage o f SM U is o u r starting-po in t fo r de lim iting a general S w ed ish vocabu lary . F o r an evaluation o f the generality o f the d ictionary , the an a ly ser has been app lied to d iffe ren t se ts o f Sw edish text. F o r one o f them , consisting o f the 10,224 m ost frequen t ty p es o f th e 7 ,3 m illio n w ord new spaper corpus o f T h e L anguage B ank (G ellerstam 1989) the w ords ou tside he scope o f the analyser have been exam ined at som e detail. H ere w e w ill p resen t the resu lts ach ieved so far, and also d iscuss th e ir im pact on o u r con tinued w ork on the d ictionary . F irst, how ever, w e w ill b riefly characterize the S M U an a ly se r w ith regard to m orpho log ica l descrip tions, and d ictionary rep resen ta tion o f in flection .
منابع مشابه
Lemmatising the Definitions of Svensk Ordbok by Morphological and Syntactic Analysis. A Pilot Study
In this paper we piresent the results of a study of the definition vocabu lary o f Svensk ordbok. It is part of our on-going work on the generation of a machine-tractable dictionary from this dictionary, in specific, of making its definitions exploitable to a parser. Aiming, in particular, at the auto matic lemmatisation of the definition vocabulary, the study includes an automatic morphologi...
متن کاملLinköpings Universitet Institutionen För Datavetenskap / Matematiska Institutionen Hjälpmedel / Admitted Material
Miniräknare / pocket calculator Engelsk-Svensk ordbok / Swedish-English dictionary Kursböckerna DALG: • Lewis, Denenberg: Data Structures and Their Algorithms. • Goodrich, Tamassia: Data Structures and Algorithms in Java. Årets kurslitteratur OPT: • Kombinatorisk optimering med linjärprogrammering, 2002 • Kombinatorisk optimering 2001. • Papadimitriou, Steiglitz: Combinatorial Optimization, Kur...
متن کاملAnalysing Finnish with word lists: the DDI approach to morphology revisited
Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitiv...
متن کاملUTACLIR @ CLEF 2002: Towards a Unified Translation Process Model
The UTACLIR query translation system was originally designed for the CLEF 2000 and 2001 campaigns. In the two first years the query translation application consisted of separate programs based on common translation principles for the language pairs Finnish English, German English and Swedish English. The idea of UTACLIR is based on recognizing distinct source key types and processing them accor...
متن کاملA Historical Lexical Database of Swedish . The O . S . A Project
Large historical dictionaries have sometimes been called information graves because of the difficulty to perform systematic searches in the material. Recently, there have been efforts to make these dictionaries machine tractable. The O.S.A project is carrying out the computerization of the largest historical dictionary of Swedish, Svenska Akademiens ordbok (SAOB). This paper describes the main ...
متن کامل